On Why Discretization Works for Naive-Bayes Classifiers
نویسندگان
چکیده
We investigate why discretization is effective in naive-Bayes learning. We prove a theorem that identifies particular conditions under which discretization will result in naiveBayes classifiers delivering the same probability estimates as would be obtained if the correct probability density functions were employed. We discuss the factors that might affect naive-Bayes classification error under discretization. We suggest that the use of different discretization techniques can affect the classification bias and variance of the generated classifiers, an effect named discretization bias and variance. We argue that by properly managing discretization bias and variance, we can effectively reduce naive-Bayes classification error.
منابع مشابه
Non-Disjoint Discretization for Naive-Bayes Classifiers
Previous discretization techniques have discretized numeric attributes into disjoint intervals. We argue that this is neither necessary nor appropriate for naive-Bayes classifiers. The analysis leads to a new discretization method, Non-Disjoint Discretization (NDD). NDD forms overlapping intervals for a numeric attribute, always locating a value toward the middle of an interval to obtain more r...
متن کاملProportional k-Interval Discretization for Naive-Bayes Classifiers
This paper argues that two commonly-used discretization approaches, fixed k-interval discretization and entropy-based discretization have sub-optimal characteristics for naive-Bayes classification. This analysis leads to a new discretization method, Proportional k-Interval Discretization (PKID), which adjusts the number and size of discretized intervals to the number of training instances, thus...
متن کاملAugmented Naive Bayesian Classifiers for Mixed-Mode Data
Conventional Bayesian networks often require discretization of continuous variables prior to learning. It is important to investigate Bayesian networks allowing mixed-mode data, in order to better represent data distributions as well as to avoid the overfitting problem. However, this attempt imposes potential restrictions to a network construction algorithm, since certain dependency has not bee...
متن کاملWhy Discretization Works for Naive Bayesian Classifiers
This paper explains why well-known dis-cretization methods, such as entropy-based and ten-bin, work well for naive Bayesian classiiers with continuous variables, regardless of their complexities. These methods usually assume that discretized variables have Dirichlet priors. Since perfect aggrega-tion holds for Dirichlets, we can show that, generally, a wide variety of discretization methods can...
متن کاملWeighted Proportional k-Interval Discretization for Naive-Bayes Classifiers
The use of different discretization techniques can be expected to affect the classification bias and variance of naive-Bayes classifiers. We call such an effect discretization bias and variance. Proportional kinterval discretization (PKID) tunes discretization bias and variance by adjusting discretized interval size and number proportional to the number of training instances. Theoretical analys...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003